Scientific workflow management and the Kepler system

نویسندگان

  • Bertram Ludäscher
  • Ilkay Altintas
  • Chad Berkley
  • Dan Higgins
  • Efrat Jaeger
  • Matthew B. Jones
  • Edward A. Lee
  • Jing Tao
  • Yang Zhao
چکیده

Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. “the Grid”). However, this infrastructure is only a means to an end and scientists ideally should be bothered little with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemy ii system, planned extensions, and areas of future research. Kepler is a communitydriven, open source project, and we always welcome related projects and new contributors to join. ∗Work supported by NSF/ITR 0225676 (SEEK), DOE SciDAC DE-FC02-01ER25486 (SDM), NSF/ITR CCR-00225610 (Chess), NSF/ITR 0225673 (GEON), NIH/NCRR 1R24 RR019701-01 Biomedical Informatics Research Network Coordinating Center (BIRN-CC), NSF/ITR 0325963 (ROADNet), NSF/DBI-0078296 (Resurgence) †San Diego Supercomputer Center, UC San Diego; ?Dept. of Computer Science & Genome Center, UC Davis; ‡National Center for Ecological Analysis and Synthesis, UC Santa Barbara; and §Department of Electrical Engineering and Computer Sciences, UC Berkeley

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IVIP - A Scientific Workflow System to Support Experts in Spatial Planning of Crop Production

Decision making for crop production planning is essentially driven by location-based or more precisely by space-oriented information. Therefore, farmers and regional experts in the field mostly rely on new spatial-data-oriented decision making tools. IVIP is a prototype for a Web-based Spatial Decision Support System (WSDSS) demonstrating the benefits of location-based decision making using dig...

متن کامل

Kurator: A Kepler Package for Data Curation Workflows

Data curation is critical for scientific data digitization, sharing, integration, and use. This paper presents Kurator, a software package for automating data curation pipelines in the Kepler scientific workflow system. Several curation tools and services are integrated into this package as actors to enable construction of workflows to perform and document various data curation tasks. The integ...

متن کامل

A Taxonomy on Tools for Scientific Workflow Management System

Scientific workflow management systems (SWFMSs) have been shown important to scientific computing and services computing [4][5][6][7] as they provide functionalities such as work flow determination, process coordination, job scheduling and execution, provenance discover and error resistance. Systems such as Pegasus [11], Taverna [8], Swift [12] ,Vistrails [10], Kepler [9] have seen wide accepta...

متن کامل

Early Cloud Experiences with the Kepler Scientific Workflow System

With the increasing popularity of the Cloud computing, there are more and more requirements for scientific workflows to utilize Cloud resources. In this paper, we present our preliminary work and experiences on enabling the interaction between the Kepler scientific workflow system and the Amazon Elastic Compute Cloud (EC2). A set of EC2 actors and Kepler Amazon Machine Images are introduced wit...

متن کامل

Knowledge Annotations in Scientific Workflows: An Implementation in Kepler

Scientific research products are the result of long-term collaborations between teams. Scientific workflows are capable of helping scientists in many ways including collecting information about how research was conducted (e.g., scientific workflow tools often collect and manage information about datasets used and data transformations). However, knowledge about why data was collected is rarely d...

متن کامل

Scientific Workflow Interoperability Evaluation

There is wide range of scientific workflow systems today, each one designed to resolve problems at a specific level. In large collaborative projects, it is often necessary to recognize the heterogeneous workflow systems already in use by various partners and any potential collaboration between these systems requires workflow interoperability. Publish/Subscribe Scientific Workflow Interoperabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Concurrency and Computation: Practice and Experience

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2006